Identification of disease propagation paths in two-layer networks

To determine the path of disease in different types of networks, a new method based on compressive sensing is proposed for identifying the disease propagation paths in two-layer networks. If a limited amount of data from network nodes is collected, according to the principle of compressive sensing, it is feasible to accurately identify the path of disease propagation in a multilayer network. Experimental results show that the method can be applied to various networks, such as scale-free networks, small-world networks, and random networks. The impact of network density on identification accuracy is explored. The method could be used to aid in the prevention of disease spread.


Identification of disease propagation paths in two-layer networks Guangjun Li 1,2* , Gang Liu 3 , Xiaoqun Wu 3,4 & Lei Pan 2,4
To determine the path of disease in different types of networks, a new method based on compressive sensing is proposed for identifying the disease propagation paths in two-layer networks. If a limited amount of data from network nodes is collected, according to the principle of compressive sensing, it is feasible to accurately identify the path of disease propagation in a multilayer network. Experimental results show that the method can be applied to various networks, such as scale-free networks, smallworld networks, and random networks. The impact of network density on identification accuracy is explored. The method could be used to aid in the prevention of disease spread.
Complex network research has been a trendy area and is on the rise. The area of research is expanding, from protein interaction networks to biological networks, from scientific research networks to social networks, from information transmission networks to disease propagation networks, and from transportation networks to logistical networks. Complex network research topics include synchronization 1,2 , robustness 3 , node importance 4 , structure identification, and stability 5 , among others [6][7][8][9] . The simulation of disease models is one of them, and it is an important component of complex network research [10][11][12] .
Disease propagation and propagation path identification are two aspects of research on the disease model of a complex network. The single-layer network model of disease was first investigated in disease models 13 . The models involved include SIS (susceptible-infected-susceptible) [10][11][12][13] , SIR (susceptible-infected-removed) 12,14 , SIRS (susceptible-infected-removed-susceptible) 15 , SEIR (susceptible-exposed-infected-recovered) 16 , SEIRS (susceptible-exposed-infected-recovered-susceptible) 17 , SIQRS (susceptible-infected-quarantined-recoveredsusceptible) 18 , SIVRS (susceptible-infected-variant-recovered-susceptible) 19,20 , etc. The propagation characteristics of the SIS model under the condition of infected individual mobility 10 , the nonlinear infectivity and adaptive weighted SIS model 13 , the network-based SIS epidemic model with global behavior notes 21 , the SIS epidemic model with infectious latency 22 , the epidemic situation of nodes with a square lattice 13 , and so on are among the most studied models. Path identification in the SIS model of a single-layer disease propagation network was also investigated by Zhesi Shen et al. 23 . Dongmei Fan et al. investigated the influence of geometric correlation on epidemic propagation with multilayer disease models 24 . Paulo Cesar Ventura da Silva et al. investigated epidemic propagation with consciousness and varied time scales in multilayer networks 25 , and Fuzhong Nian et al. simulated the MR-SIS propagation process in multirelational networks 26 . There is more research on the model of disease propagation in the above research than there is on identifying the propagation path. There is currently just a small amount of literature on disease path identification, and it is focused on single-layer disease networks. In multilayer networks, there is a lack of literature on path identification. This may be because path identification in multilayer disease networks is difficult. Furthermore, identifying the disease propagation path in multilayer networks is critical for disease prevention and control 27 . We will investigate the identification of propagation paths in a multilayer disease network in this paper.
The Granger causality test 28 , synchronization-based 29 , and compressive sensing-based topology identification 30,31 are all common methods. The Granger method of topology identification necessitates stochastic network node perturbations, which is not realistic in disease propagation networks. The synchronizationbased method requires constructing an auxiliary network with a very general form and designing some adaptive controllers. However, the compressive sensing-based method in this paper does not need these additional conditions. To identify the topology of a disease propagation network, only a small amount of measurement data is needed. Compressed sensing is a new sampling theory that can obtain discrete samples of a signal when the sampling rate is substantially lower than Nyquist, allowing for distortion-free signal reconstruction.The www.nature.com/scientificreports/ compressive sensing-based method has the advantages of requiring less data to be measured and having a high identification accuracy 32,33 . The basic concept of compressive sensing is Y = AX , where Y ∈ R m , A is a m × n dimensional matrix, X ∈ R n , and m ≪ n . Only a few values of Y and A need to be measured to reliably identify X when there are enough zero elements in it. The aim of this paper is to use compressive sensing to identify the topology of disease spread in multilayer networks. Figure 1 depicts a typical two-layer disease model. The crowd's spread of consciousness is shown in the upper layer 34 , where nodes represent the individuals and edges between nodes indicate information propagation paths. Contacts via Facebook, WeChat, and other social media platforms can be used to spread information. The model used in the upper layer is UAU, where U represents the individual who is unaware of the disease and A represents the one who is aware. After acquiring information from those who are aware, those who are unaware can be changed into aware, with a probability of . The state of the node is A (aware) if the node is aware of disease; otherwise, the status is U (unaware). The lower layer depicts the spread of disease among humans, with nodes representing people and edges between nodes representing disease propagation paths. We use the conventional SIS model in the lower layer, where S represents the susceptible population and I represents the infected population. In the lower layer, we use S(i) to denote the status of infected and susceptible nodes as follows:

Method
and the state of an arbitrary node i is denoted as M(i) in the upper layer, where We use a ij and b ij in the networks to denote the edges between nodes in the upper and lower networks, respectively (especially if there is an edge between node i and node j, then a ij = 1 or b ij = 1 ; otherwise, a ij = 0 or b ij = 0 ). If node i is unaware of the disease during disease propagation, node i may be converted to aware after being notified by surrounding nodes. The probability is denoted by r i (t) . The probability that node i will be infected by surrounding nodes at time t is q A01 i (t) if it is susceptible and aware. The probability is q U01 where p A i is the infection probability of node i in the susceptible population if node i is aware, p U i is the infection probability of node i if node i is unaware, and is the probability of node i transitioning from unaware to aware when getting a notification. Meanwhile, node i recovers at a probability of σ after infection.  where � m×(N−1) is defined as the state of other nodes except node i.
at different time t, one obtains the following equation: We just need to solve vector X in Eq.(5) to find the propagation paths between node i and other nodes in the disease model. Letting i = 1, 2, · · · , N , we can identify the paths among all nodes. The problem is then transformed into solving the equation X = Y , where is a m × (N − 1) matrix, Y is a known vector, and the X vector is to be determined. According to the literature 23 , q A01 i (t) and q U01 i (t) can be obtained based on big data. For a specific disease, p A i and p U i are known 23 , and only the status of each node needs to be measured. When there are few nonzero elements in X, we only need a minimal number of measurements to accurately solve X, according to compressive sensing theory 35,36 . Minimizing the number of nonzero components in X produces the sparsest solution to X = Y with respect to X, i.e., Classical Tikhonov regularization is used to solve X = Y to obtain accurate reconstruction and increase numerical stability. Then, this problem can be approximated by The regularization value α is used to avoid large deviations from the optimal solution. To solve the convex optimization problem, we usually use alternating direction method of multipliers (ADMM) algorithms 37 . Numerical simulation. We use different network structures for simulation to verify the universality of our method of identifying propagation paths. It is assumed that a specific disease has spread throughout a community. The notations p A i and p U i denote the infection probability of a node in the aware and unaware states, respectively. In practice, p A i < p U i . represents the probability that a node will become aware of the disease after being notified by any of its neighbours, and σ represents the recovery probability. We set the path value to 1 if there is a disease propagation path between two nodes in the community, which corresponds to b ij = 1 . We state in the identification procedure that the path is regarded to exist if the identification value is b ij ∈ [1 − ε, 1 + ε] and nonexistent if the value is b ij ∈ [−ε, ε] . The value of ε in this paper is 0.01. TPR (true positive rate) and TNR (true negative rate) are indicators of identification accuracy, with the TPR being the ratio of all correctly identified paths out of all existing paths and the TNR representing the percentage of all correctly identified nonexistent paths out of all nonexistent paths.
The identification error for nonzero (existing) and zero (nonexistent) edges is represented by E nz and E z , respectively.
(4) Identification paths of disease propagation in ER, WS, and BA networks. We initially discover disease propagation paths in various types of networks using numerical simulation. Random networks (ER), small-world networks (WS), and scale-free networks (BA) are the networks we chose. The three networks are mostly used to mimic real population relationships in disease propagation models 6,27,34 . The following are the practical implications: Everyone can be considered a node, and there are a great number of paths linking them in a WS network. People who know each other are represented by the connected nodes. A few nodes in BA networks have a great number of connections, while the majority of nodes have minimal connections.Given a specific number of nodes, there is the same probability of a path existing between each pair of nodes in an ER network, u and v. The data ratio in the simulation refers to the proportion of actual observed data to the data necessary for a typical solution. The typical solution takes N − 1 measurements to solve the solution vector (b i1 , b i2 , · · · , b i−1 , b i+1 , · · · , b N ) T , as given in Eq. 6. However, compressive sensing assumes that only k times ( k < N − 1 ) of data measurement are necessary to solve the vector, and the data ratio is defined as the ratio of k to N − 1 . The parameters are as follows: all networks' edge connection probability is 10% , and the total number of nodes in all three networks is 500. The infection probability is 20% , p A i = 0.4, p U i = 0.7 , and σ=0.2. The identification of paths connecting all nodes is repeated separately 30 times in the simulation, and the average value is taken. The result of the identification is presented in Fig. 2, with the top and lower boundaries of each data point label representing the result's standard deviations.
The disease propagation paths for the three networks can be accurately determined, as shown in Fig. 2. The WS network outperforms the others in terms of identification, while the ER network comes in second. This could be because the nodes in the WS network are more uniformly connected than those in other networks. When www.nature.com/scientificreports/ the data ratio is 40% , the propagation paths of the WS and ER networks may be reliably identified in Panels (a) and (b) of Fig. 2. The paths of the BA network are also accurately identified when the data ratio is 50% . The errors of identification vary with the data ratio, as seen in Panels (c) and (d) of Fig. 2. The relative error of the real existing path is shown in Panel (c) of Fig. 2, whereas the average absolute error of the nonexistent path is shown in Panel (d). When the data ratio is 40% , the average relative error and absolute error of the WS and ER networks in identification are already very small, as shown in Panels (c) and (d) of Fig. 2. The ER network is zero and 0.009, and the WS network is all zero. The BA network is the worst. When the data ratio is 50% , each of the three networks has a zero error.
Impact of network density on identification. To investigate the impact of network density on identification, we simulate networks of the same kind but with varied densities. In this case, we use the BA scale-free model. The network's density parameters are m = 2 , m = 4 , and m = 8 . When generating the network, m represents the number of connection edges of the new node. The network density is 3.92% , 7.68% and 14.72% , respectively. The number of network nodes is set to 100, and the rest of the settings are the same as in "Conclusion" Section Fig. 3 depicts the identification result. As shown in Fig. 3, propagation paths may be accurately identified at three different path densities while maintaining the same data ratio. However, as the path density of the network increases, the accuracy of the identification decreases. The identification accuracy of the TPR in three different path densities is 90% , 60% , and 7% when the data ratio is 20% , as shown in Panel (a) of Fig. 3. The identification accuracy of the TNR is better than that of the TPR, as shown in Panel (b) of Fig. 3; however, the identification accuracy diminishes as path density increases. The identification accuracy is 98% , 94% , and 86% when the data ratio is 20% . The relative and absolute errors of identification rapidly decrease as the data ratio increases, as seen in Panels (c) and (d) of Fig. 3. The absolute error is decreased to zero when the data ratio is 60% , and the relative error is reduced to zero when the data ratio is 70%.
The compressive sensing method has the advantage of identifying the unknown path with less data. Only a few monitoring data are required to determine the disease's propagation path. Even with a poor data ratio, we can still roughly identify the path that the disease takes to propagate. Fig. 3 illustrates this. We only need 20% of the data in the BA network with 3.92% density, and the recognition accuracy can reach 90% . When the network density is 14.72% , the required monitoring data are approximately 40% , and the identification accuracy is approximately 80%.

Conclusion
To accurately identify disease propagation paths in multilayer networks, this paper uses compressive sensing to identify disease propagation paths using only a few measurement data. The method can accurately identify the path in many types of networks, including ER, WS, and BA, according to experimental data. It has the best identifying performance among them in the WS network. The identification performance of BA networks with various densities has been examined and assessed. The results reveal that as network density increases, the accuracy of identification decreases and the error increases. The path of disease propagation can still be accurately identified assuming appropriate measurement data are added. This method could help in the prevention of disease epidemics in the general population.

Data availability
The datasets used and analysed during the current study available from the corresponding author on reasonable request.